The Development of the "Index Thomisticus" Treebank Valency Lexicon

نویسندگان

  • Barbara McGillivray
  • Marco Passarotti
چکیده

We present a valency lexicon for Latin verbs extracted from the Index Thomisticus Treebank, a syntactically annotated corpus of Medieval Latin texts by Thomas Aquinas. In our corpus-based approach, the lexicon reflects the empirical evidence of the source data. Verbal arguments are induced directly from annotated data. The lexicon contains 432 Latin verbs with 270 valency frames. The lexicon is useful for NLP applications and is able to support annotation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Index Thomisticus Treebank Project: Annotation, Parsing and Valency Lexicon

We present an overview of the Index Thomisticus Treebank project (IT-TB). The ITTB consists of around 60,000 tokens from the Index Thomisticus by Roberto Busa SJ, an 11million-token Latin corpus of the texts by Thomas Aquinas. We briefly describe the annotation guidelines, shared with the Latin Dependency Treebank (LDT). The application of data-driven dependency parsers on IT-TB and LDT data is...

متن کامل

Selectional Preferences from a Latin Treebank

We present a system for automatically acquiring selectional preferences for Latin verbs. We use the Index Thomisticus Treebank Valency Lexicon and an enriched version of Latin WordNet as the reference conceptual hierarchy.

متن کامل

A Syntactic Valency Lexicon for Persian Verbs: The First Steps towards Persian Dependency Treebank

Valency lexicons are valuable resources for natural language processing. The need for new resources for languages encourages researchers to collect new datasets. One of the most important datasets is valency lexicons. In valency lexicons, information about obligatory and optional complements of words is annotated at the syntactic and semantic levels. In this paper, we report the development of ...

متن کامل

Valency in the Prague Dependency Treebank: Building the Valency Lexicon

In this article we focus on valency, which belongs to the core phenomena being captured in the underlying level of the Prague Dependency Treebank (PDT). We present a summary of the basic principles of the applied theoretical framework including proposals for suitable refinement relevant to NLP. The current status of description of valency behavior of verbs, nouns and adjectives is outlined. We ...

متن کامل

Czech-English Bilingual Valency Lexicon Online

We describe CzEngVallex, a bilingual Czech–English valency lexicon which aligns verbal valency frames and their arguments. It is based on a parallel Czech-English corpus, the Prague Czech-English Dependency Treebank (PCEDT), where for each occurrence of a verb, a reference to the underlying Czech and English valency lexicons (PDT-Vallex and CzEngVallex, respectively) is recorded. The CzEngValle...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009